Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp: Implement --sparse flag #3766

Merged
merged 3 commits into from
Aug 4, 2022
Merged

Conversation

pimzero
Copy link
Contributor

@pimzero pimzero commented Aug 1, 2022

This begins to address #3362 (the logic behind the --sparse=always flag, for linux only at the moment)

This PR does 2 things:

  • Refactor --sparse code path to ~match --reflink one
  • Implement the --sparse=always + handling of --sparse with --reflink

The --sparse=auto needs a bit more logic (as documented in the commit message).

I think all the information needed (rationale, logic, future directions) is already explained in the commit, so copying them here:

1 - cp: Refactor reflink/sparse handling to enable --sparse flag

--sparse and --reflink options have a lot of similarities:

  • They have similar options (always, never, auto)
  • Both need OS specific handling
  • They can be mutually exclusive

Prior to this change, sparse was defined as CopyMode, but reflink wasn't. Given the similarities, it makes sense to handle them similarly.

The idea behind this change is to move all OS specific file copy handling in the copy_on_write_* functions. Those function then dispatch to the correct logic depending on the arguments (at the moment, the tuple (reflink, sparse)).

Also, move the handling of --reflink=never from copy_file to the copy_on_write_* functions, at the cost of a bit of code duplication, to allow copy_on_write_* to handle all cases (and later handle --reflink=never with --sparse).

2 - cp: Implement --sparse flag

This begins to address #3362

At the moment, only the --sparse=always logic matches the requirement form GNU cp info page, i.e. always make holes in destination when possible.

Sparse copy is done by copying the source to the destination block by block (blocks being of the destination's fs block size). If the block only holds NUL bytes, we don't write to the destination.

About --sparse=auto: according to GNU cp info page, the destination file will be made sparse if the source file is sparse as well. The next step are likely to use lseek with SEEK_HOLE detect if the source file has holes. Currently, this has the same behaviour as --sparse=never. This SEEK_HOLE logic can also be applied to --sparse=always to improve performance when copying sparse files.

About --sparse=never: from my understanding, it is not guaranteed that Rust's fs::copy will always produce a file with no holes, as "platform-specific behavior may change in the future"

About other platforms:

  • macos: The solution may be to use fcntl command F_PUNCHHOLE.
  • windows: I only see FSCTL_SET_SPARSE.

This should pass the following GNU tests:

  • tests/cp/sparse.sh
  • tests/cp/sparse-2.sh
  • tests/cp/sparse-extents.sh
  • tests/cp/sparse-extents-2.sh

sparse-perf.sh needs --sparse=auto, and in particular a way to skip holes in the source file.

sparse_mode: SparseMode,
context: &str,
) -> CopyResult<()> {
if (reflink_mode != ReflinkMode::Never) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (reflink_mode != ReflinkMode::Never) {
if reflink_mode != ReflinkMode::Never {

src/uu/cp/src/cp.rs Outdated Show resolved Hide resolved
src/uu/cp/src/cp.rs Outdated Show resolved Hide resolved
@pimzero pimzero force-pushed the 3362-sparse-clean branch 2 times, most recently from 7fd289f to 34b6cbc Compare August 2, 2022 16:50
src/uu/cp/src/cp.rs Outdated Show resolved Hide resolved
@pimzero pimzero force-pushed the 3362-sparse-clean branch from 5be06c8 to 53ae311 Compare August 2, 2022 20:17
@sylvestre
Copy link
Contributor

Small conflict :)

pimzero added 2 commits August 3, 2022 16:02
`--sparse` and `--reflink` options have a lot of similarities:
 - They have similar options (`always`, `never`, `auto`)
 - Both need OS specific handling
 - They can be mutually exclusive

Prior to this change, `sparse` was defined as `CopyMode`, but `reflink`
wasn't. Given the similarities, it makes sense to handle them similarly.

The idea behind this change is to move all OS specific file copy
handling in the `copy_on_write_*` functions. Those function then
dispatch to the correct logic depending on the arguments (at the moment,
the tuple `(reflink, sparse)`).

Also, move the handling of `--reflink=never` from `copy_file` to the
`copy_on_write_*` functions, at the cost of a bit of code duplication,
to allow `copy_on_write_*` to handle all cases (and later handle
`--reflink=never` with `--sparse`).
This begins to address uutils#3362

At the moment, only the `--sparse=always` logic matches the requirement
form GNU cp info page, i.e. always make holes in destination when
possible.

Sparse copy is done by copying the source to the destination block by
block (blocks being of the destination's fs block size). If the block
only holds NUL bytes, we don't write to the destination.

About `--sparse=auto`: according to GNU cp info page, the destination
file will be made sparse if the source file is sparse as well. The next
step are likely to use `lseek` with `SEEK_HOLE` detect if the source
file has holes. Currently, this has the same behaviour as
`--sparse=never`. This `SEEK_HOLE` logic can also be applied to
`--sparse=always` to improve performance when copying sparse files.

About `--sparse=never`: from my understanding, it is not guaranteed that
Rust's `fs::copy` will always produce a file with no holes, as
["platform-specific behavior may change in the
future"](https://doc.rust-lang.org/std/fs/fn.copy.html#platform-specific-behavior)

About other platforms:
 - `macos`: The solution may be to use `fcntl` command `F_PUNCHHOLE`.
 - `windows`: I only see `FSCTL_SET_SPARSE`.

This should pass the following GNU tests:
 - `tests/cp/sparse.sh`
 - `tests/cp/sparse-2.sh`
 - `tests/cp/sparse-extents.sh`
 - `tests/cp/sparse-extents-2.sh`

`sparse-perf.sh` needs `--sparse=auto`, and in particular a way to skip
holes in the source file.
@pimzero pimzero force-pushed the 3362-sparse-clean branch from 53ae311 to b8fc067 Compare August 3, 2022 15:55
@sylvestre
Copy link
Contributor

Indeed, bravo :)

Warning: Congrats! The gnu test tests/cp/reflink-auto is no longer failing!
Warning: Congrats! The gnu test tests/cp/sparse-2 is no longer failing!
Warning: Congrats! The gnu test tests/cp/sparse-extents-2 is no longer failing!
Warning: Congrats! The gnu test tests/cp/sparse is no longer ERROR!
Warning: Congrats! The gnu test tests/cp/sparse-extents is no longer ERROR!

@sylvestre sylvestre merged commit e199152 into uutils:main Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants